This report explores a dataset containing white wine attributes for 4898 wine.
## 'data.frame': 4898 obs. of 13 variables:
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## $ quality.levels : Factor w/ 7 levels "3","4","5","6",..: 4 4 4 4 4 4 4 4 4 4 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 3.800 Min. :0.0800 Min. :0.0000 Min. : 0.600
## 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700 1st Qu.: 1.700
## Median : 6.800 Median :0.2600 Median :0.3200 Median : 5.200
## Mean : 6.855 Mean :0.2782 Mean :0.3342 Mean : 6.391
## 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900 3rd Qu.: 9.900
## Max. :14.200 Max. :1.1000 Max. :1.6600 Max. :65.800
##
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.00900 Min. : 2.00 Min. : 9.0
## 1st Qu.:0.03600 1st Qu.: 23.00 1st Qu.:108.0
## Median :0.04300 Median : 34.00 Median :134.0
## Mean :0.04577 Mean : 35.31 Mean :138.4
## 3rd Qu.:0.05000 3rd Qu.: 46.00 3rd Qu.:167.0
## Max. :0.34600 Max. :289.00 Max. :440.0
##
## density pH sulphates alcohol
## Min. :0.9871 Min. :2.720 Min. :0.2200 Min. : 8.00
## 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100 1st Qu.: 9.50
## Median :0.9937 Median :3.180 Median :0.4700 Median :10.40
## Mean :0.9940 Mean :3.188 Mean :0.4898 Mean :10.51
## 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500 3rd Qu.:11.40
## Max. :1.0390 Max. :3.820 Max. :1.0800 Max. :14.20
##
## quality quality.levels
## Min. :3.000 3: 20
## 1st Qu.:5.000 4: 163
## Median :6.000 5:1457
## Mean :5.878 6:2198
## 3rd Qu.:6.000 7: 880
## Max. :9.000 8: 175
## 9: 5
Our dataset consists of 12 variables, with 4898 observations.
Tips: Quality has 10 levels, 0-10. So I took the binwidth as 0.5 of the histogram to show a clear distribution
The distribution is quite clear. The most vote for the quality is around 5-7. I’m wondering what kind of ingrediant influence the quality the most? More or less of them? How to rate a very good wine?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
Most wine likes to add a range of fixed.acidity,volatile.acidity,citric.acit into the wine. Citric.acid has a special data which is around 0.5.
## Mode FALSE TRUE
## logical 4879 19
We can see 19 wine did not add citric.acid, it does matter to influence the quality of wine or not?
##
## 0.6 0.7 0.8 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45
## 2 7 25 39 4 93 1 146 3 187 3 147 2 184 4
## 1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95
## 142 2 165 2 99 1 99 3 59 2
It looks like most wine likes to add 1.1,1.2,1.4,1.5,1.6g/m^3 sugar. But is this the best choice for wine?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
Cholorides has a long tale, but the majority of chlorides is around 0.036 to 0.05 with a mean of 0.04577
In this section, I indroduced a new attibute ratio.free.sulfur.dioxide. It can be seen that the ratio of free sulfur dioxide is about 0.19 to 0.32, and the free sulfur dioxide is about 23 - 46 mg/dm^3
Except the alcohol rate, density,pH,sulphates is like normal distribution. We’ll figure out the relationships between them.
There are 4898 white wines in the dataset with 12 features (“fixed.acidity”,“volatile.acidity”,“citric.acid”,“residual.sugar”,“chlorides”,“free.sulfur.dioxide”,“total.sulfur.dioxide”,“density”,“pH”,“sulphates”,“alcohol”,“quality”)
acidity contains:“fixed.acidity”,“volatile.acidity”,“citric.acid”,“pH”;
sulfur dioxide contains:“free.sulfur.dioxide”,“total.sulfur.dioxide”,“sulphates”;
density contains:“residual.sugar”,“chlorides”,“alcohol”
The main feature is Quality. In this case, I shall figure out the main influences contribute to the quality of white wine.
“fixed.acidity”,“volatile.acidity”,“citric.acid”,“residual.sugar”,“chlorides”,“free.sulfur.dioxide”,“total.sulfur.dioxide”,“density”,“pH”,“sulphates”,“alcohol” will help support my investigation into the quality
I introduced quality.levels to be quality factors.
I didnot do any perations for now.
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.02269729 0.289180698
## volatile.acidity -0.02269729 1.00000000 -0.149471811
## citric.acid 0.28918070 -0.14947181 1.000000000
## residual.sugar 0.08902070 0.06428606 0.094211624
## chlorides 0.02308564 0.07051157 0.114364448
## free.sulfur.dioxide -0.04939586 -0.09701194 0.094077221
## total.sulfur.dioxide 0.09106976 0.08926050 0.121130798
## density 0.26533101 0.02711385 0.149502571
## pH -0.42585829 -0.03191537 -0.163748211
## sulphates -0.01714299 -0.03572815 0.062330940
## alcohol -0.12088112 0.06771794 -0.075728730
## quality -0.11366283 -0.19472297 -0.009209091
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.08902070 0.02308564 -0.0493958591
## volatile.acidity 0.06428606 0.07051157 -0.0970119393
## citric.acid 0.09421162 0.11436445 0.0940772210
## residual.sugar 1.00000000 0.08868454 0.2990983537
## chlorides 0.08868454 1.00000000 0.1013923521
## free.sulfur.dioxide 0.29909835 0.10139235 1.0000000000
## total.sulfur.dioxide 0.40143931 0.19891030 0.6155009650
## density 0.83896645 0.25721132 0.2942104109
## pH -0.19413345 -0.09043946 -0.0006177961
## sulphates -0.02666437 0.01676288 0.0592172458
## alcohol -0.45063122 -0.36018871 -0.2501039415
## quality -0.09757683 -0.20993441 0.0081580671
## total.sulfur.dioxide density pH
## fixed.acidity 0.091069756 0.26533101 -0.4258582910
## volatile.acidity 0.089260504 0.02711385 -0.0319153683
## citric.acid 0.121130798 0.14950257 -0.1637482114
## residual.sugar 0.401439311 0.83896645 -0.1941334540
## chlorides 0.198910300 0.25721132 -0.0904394560
## free.sulfur.dioxide 0.615500965 0.29421041 -0.0006177961
## total.sulfur.dioxide 1.000000000 0.52988132 0.0023209718
## density 0.529881324 1.00000000 -0.0935914935
## pH 0.002320972 -0.09359149 1.0000000000
## sulphates 0.134562367 0.07449315 0.1559514973
## alcohol -0.448892102 -0.78013762 0.1214320987
## quality -0.174737218 -0.30712331 0.0994272457
## sulphates alcohol quality
## fixed.acidity -0.01714299 -0.12088112 -0.113662831
## volatile.acidity -0.03572815 0.06771794 -0.194722969
## citric.acid 0.06233094 -0.07572873 -0.009209091
## residual.sugar -0.02666437 -0.45063122 -0.097576829
## chlorides 0.01676288 -0.36018871 -0.209934411
## free.sulfur.dioxide 0.05921725 -0.25010394 0.008158067
## total.sulfur.dioxide 0.13456237 -0.44889210 -0.174737218
## density 0.07449315 -0.78013762 -0.307123313
## pH 0.15595150 0.12143210 0.099427246
## sulphates 1.00000000 -0.01743277 0.053677877
## alcohol -0.01743277 1.00000000 0.435574715
## quality 0.05367788 0.43557472 1.000000000
According to the subset of the data:
quality strong factors:density,alcohol,total.sulfur.dioxide,chlorides
pH strong factors: fixed.acidity,citic.acid,residual.sugar,density
density strong factors: residual.sugar,sulfur.dioxide
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
It canbe seen that, the quality decreased as the density increase, but the inluential rate is not that much. A good wine should have a low density around 0.98711 to 0.995.
It canbe seen that, the alcohol rate and desity decreased while the quality is bellow 5; the alcohol rate and density increased while the quality is above 5.
If it is a good wine, the alcohol rate can be 11.5% to 12.9%.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.0 101.0 122.0 125.2 146.0 229.0
In this section, we can see, good white wine only allows a wine has total.sulfur.dioxide about 125.2 mg/m^3
Although, we can not sure that free.sulfur.dioxide really influence the quality of the white wine, we can set it into a narrow numbers, like 30-50mg/m^3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.03100 0.03700 0.03816 0.04400 0.13500
A good wine really has small amount of salt. near 0.031 to 0.044g/m^3
Keep a good pH is nessacery, the range is about 3.0 to 3.4
The quality is influences by the wine’s density, alcohol rate chlorides, and the total.sulfur.dixode the most. A good white wine should have a low density, a high alcohol rate, 125mg/m^3 total.sulfur.dixode, and a little chlorides.
Although, we can not find the strong relationships with pH, but pH is related to fixed.acidity,citic.acid,residual.sugar,density.
Also, look deep in density, the strong factor contains residual.sugar,sulfur.dioxide. The influences just influence each other.
It is alcohol, the seconde is density, the third is chlorides.
residual.sugar is correlated to density, the points along the green line seems to be a good wine than others.
It can be seen that pH is correlated to fixed.acidity. The most popular taste of white wines is about in the mean of pH and fixed.acidity
Also for the citric acid, we can ee that the mean of citric.acid shows higher rate for white wine.
With a high rate of alcohol and a rate of density, can be a very tasty wine.
This part is quite interesting. Just like we need salt for a shock, some people like to add some salt in it, but marjority of people prefer not to.
PH is influenced by acid, sugar, so ploting the images we can see that the mean of the acid can be used to get a nice white wine. But the best taste of pH is always around 3.0-3.4 which can be seen in Bivariate plots.
Density is influenced by sugar a lot, people likes to drink a low density of white wine but also with sugar.
Chlorides vs achohol is quite interesting. Just like we need salt for a shock, some people like to add some salt in it, but marjority of people prefer not to.
This model is very week, so I decided to delete it as the r-value is only 0.28. Every expert has its taste, because of the lack of data, we can not model a good accurate model for analyzing the real quality of the wine, but a approximate result.But we can actually conclude some statitical result that what a good white wine looks like
We can see that, in this dataset of whitewines, the majority of wines are rated as 5,6. As the quality above 8 and 9 is very small, it can not be a convinced datebase for analizing a high qualified wine, but for 5 to 8 wines.
This plot contains three viriables density,alchhol, chlorides, these three elements are most contributing to the rate of white wine.
This plot shows the realtionship between Alcohol vs Density vs Quality. We can see that the good white wine all gather around the mean value of density, alcohol rate around 10-13%. But still taste is depending on individuals.
In this dataset, We can figure out the best choice for a white wine includes following attibutes:
alcohol rate recommended to be 11.5% to 12.9%
density recommended to be 0.98711 to 0.995
chlorides recommended to be 0.031 to 0.044g/m^3
pH recommended to be 3.0 to 3.4
free.sulfur.dioxide recommended to be 30-50mg/m^3
total.sulfur.dioxide recommended to be 125.2 mg/m^3
Using this data can help you get a nice drink wine.
We can see in the dataset: a good wine is required accurate dot of acid, sugar, salt, and sulfur dixode. The rate can only reflects some recommendation.
But because of the lack of data for quality 8 - 9, only have 180 white wines, we cannot build an accurate model to analyze the result.
For future work, we should gather more and more data about high quality white wines in order to find out the real solution to judge a good white wines.